Skip to content

UPSTREAM PR #18849: Deepseek v3.2 dense attention support from @fairydreaming#923

Open
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18849-branch_createthis-deepseek_v3_2
Open

UPSTREAM PR #18849: Deepseek v3.2 dense attention support from @fairydreaming#923
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18849-branch_createthis-deepseek_v3_2

Conversation

@loci-dev
Copy link
Copy Markdown

Mirrored from ggml-org/llama.cpp#18849

This is a bare minimum implementation of DeepSeek V3.2 using dense attention only. @fairydreaming wrote this code, I just packaged it into a PR.

I've generated GGUFs with this: https://huggingface.co/createthis/DeepSeek-V3.2-dense-GGUF

Then inferred them out to 48300 context with several turns. It seems to work fine.

The major issue is that the sparse attention tensors are left out of the GGUF.

If this is unacceptable, I have another PR from back in October that populates the sparse attention tensors in the GGUF, but still doesn't use them for inference. I abandoned that PR because it fell into degenerate generation at about 45k context. Now that I know this PR works, I can attempt to fix the other PR.

Let me know what you think.

@loci-review
Copy link
Copy Markdown

loci-review bot commented Jan 14, 2026

Explore the complete analysis inside the Version Insights

Based on the analysis, no functions were identified with meaningful performance changes between the base and target versions. The code modifications did not result in measurable performance impact.

@loci-dev loci-dev force-pushed the main branch 27 times, most recently from b96fcb2 to 9e5f0e1 Compare January 19, 2026 23:09
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 881552d to d592984 Compare January 26, 2026 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants